The Logic of Adaptive Behavior

نویسندگان

  • Martijn van Otterlo
  • Luis Borges
چکیده

ion is one of most important tools of the computer scientist. In fact, the design and implementation of complex systems cannot be done without it. In general, abstraction is used to simplify elements of complex systems such that complexity is lowered without decreasing performance significantly. In artificial intelligence and machine learning, its main goal is to lower the complexity of reasoning and learning without losing the ability to deal effectively with the task. In this chapter, we focus on abstraction mechanisms used in the context of MDPs. The learning algorithms from the previous chapter do not scale up well to larger state spaces. Various abstractions are proposed in the literature that make use of the inherent structure in MDPs such that even very large problems can be solved. Structure can be found in symmetries and equivalence classes in state and action spaces, in task hierarchies and also in regularities in transition models and reward functions. Making use of abstractions renders a need to deal with structural induction and deduction in parallel with value and policy learning algorithms. The main purpose of this chapter is to highlight the principles of abstraction in MDPs and the interplay between abstractions and RL algorithms. We introduce PIAGET as an extension of GPI in the face of abstraction. Furthermore, we distinguish five types of abstraction for solution of MDPs and the same underlying principles can be found in the following chapters covering relational representations. THE MARKOV DECISION PROCESS (MDP) framework has become a de facto standard method for learning sequential decision problems in which a performance metric is available to optimize decision making in the context of uncertainty. It provides a general modeling framework for many interesting tasks such as (probabilistic) planning, game-playing and goal-seeking behavior. In principle, all (fully-observable) tasks in which there is a (numerical) performance measure available that can evaluate behavior, can be solved in the MDP framework. The previous chapter has introduced MDPs and solution algorithms based on explicit state and action representations. However, computing solutions for systems at the most fine-grained level is often inconvenient and inefficient. If we analyze humans learning and reasoning about complicated tasks, we see that they use abstraction and generalization techniques that enable them to see and use the inherent structure present in many tasks (see Baum, 2004). For example, consider a board game such as CHESS, GO, or CHECKERS. When explaining the game and Generalization and Abstraction in Markov Decision Processes O X O * X O X O X * X * O X O * X O X O Figure 3.1: Symmetries in TIC-TAC-TOE. An optimal move for X is denoted ∗ in each board position. All four positions with their corresponding optimal action are equivalent when taking rotational symmetries into account. The actions (∗) in these positions describe create so-called fork positions. The opponent O cannot block both lines of X ’s, such that X will win in the next move. its purpose to new players, we do not go into explaining each board position individually and which actions are possible in each situation, simply because most games have enormous state spaces. In fact, the state space of the simple game of TIC-TAC-TOE contains already (roughly) 6000 states. Instead, the game is explained in terms of general rules, properties of states and abstract goals. For TIC-TAC-TOE one would explain that there are two types of symbols (one for each player), that there are 9 squares, that each player has to put its symbol on an empty square and that there are lines on the board such that when one player fills one line entirely with his or her symbols, he or she wins the game (i.e. gets a reward +1). A suitable policy for playing the game makes use of the lines on the board, and patterns of symbols on the board. TIC-TAC-TOE contains a considerable amount of symmetry such that knowledge about an optimal action in one position can be generalized to other positions that are similar with respect to these symmetries, see Figure 3.1. Humans can master the game fairly quickly by making use of the patterns on the board and symmetries between board positions. There is no need to see all possible positions to compute an optimal strategy, provided that one makes use of abstraction and generalization. The field of artificial intelligence (AI) (Russell and Norvig, 2003; Görtz et al., 2003; Luger, 2002) shows a wide variety of abstraction and generalization techniques that can be used to highlight structure in problem domains. Knowledge representation (KR) (Markman, 1999; Sowa, 1999; Brachman and Levesque, 2004) formalisms play an important role in representing and reasoning about problems in compact, comprehensible and efficient ways. The field of machine learning (Langley, 1996; Mitchell, 1997; Alpaydin, 2004) provides many ways for the induction of compact hypotheses that generalize over problem instances. All these approaches can be used in the context of MDPs. Many types of abstraction can be used for compact representations of states, actions, transition and reward functions, policies, and task structures, such that representing and learning can be performed on conceptually higher levels than that of individual states and actions (see for example Bertsekas and Tsitsiklis, 1996; Sutton, 1997; Sutton and Barto, 1998; Boutilier, 1999; Boutilier et al., 1999). Furthermore, generalization techniques such as function approximators can learn compact mappings from states to values. The algorithms in the previous chapter do not scale up to arbitrarily large problems, due to the sheer size of state spaces, which is even infinite for continuous state spaces. In this chapter we will discuss various abstraction, generalization and KR techniques used in the context of MDPs. We focus on methods that use MDPs and RL algorithms as their main components. It is unlikely that it is possible to develop a general-purpose algorithm that can approximate solutions to arbitrary sequential decision problems. Instead there will be more likely a whole range of algorithms that exploit different types of

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Adaptive Learning Game for Autistic Children using Reinforcement Learning and Fuzzy Logic

This paper, presents an adapted serious game for rating social ability in children with autism spectrum disorder (ASD). The required measurements are obtained by challenges of the proposed serious game. The proposed serious game uses reinforcement learning concepts for being adaptive. It is based on fuzzy logic to evaluate the social ability level of the children with ASD. The game adapts itsel...

متن کامل

Designing an adaptive fuzzy control for robot manipulators using PSO

This paper presents designing an optimal adaptive controller for tracking control of robot manipulators based on particle swarm optimization (PSO) algorithm. PSO algorithm has been employed to optimize parameters of the controller and hence to minimize the integral square of errors (ISE) as a performance criteria. In this paper, an improved PSO using logic is proposed to increase the convergenc...

متن کامل

Sensorless Model Reference Adaptive Control of DFIG by Using High Frequency Signal Injection and Fuzzy Logic Control

In this paper, a new sensorless model reference adaptive method is used for direct control of active and reactive power of the doubly fed induction generator (DFIG). In order to estimate the rotor speed, a high frequency signal injection scheme is implemented. In this study, to improve the accuracy of speed estimation, two methods are suggested. First, the coefficients of proportional-integral ...

متن کامل

Doppler and bearing tracking using fuzzy adaptive unscented Kalman filter

The topic of Doppler and Bearing Tracking (DBT) problem is to achieve a target trajectory using the Doppler and Bearing measurements. The difficulty of DBT problem comes from the nonlinearity terms exposed in the measurement equations. Several techniques were studied to deal with this topic, such as the unscented Kalman filter. Nevertheless, the performance of the filter depends directly on the...

متن کامل

Intuitionistic fuzzy logic for adaptive energy efficient routing in mobile ad-hoc networks

In recent years, mobile ad-hoc networks have been used widely due to advances in wireless technology. These networks are formed in any environment that is needed without a fixed infrastructure or centralized management. Mobile ad-hoc networks have some characteristics and advantages such as wireless medium access, multi-hop routing, low cost development, dynamic topology and etc. In these netwo...

متن کامل

Adaptive fuzzy sliding mode and indirect radial-basis-function neural network controller for trajectory tracking control of a car-like robot

The ever-growing use of various vehicles for transportation, on the one hand, and the statistics ofsoaring road accidents resulting from human error, on the other hand, reminds us of the necessity toconduct more extensive research on the design, manufacturing and control of driver-less intelligentvehicles. For the automatic control of an autonomous vehicle, we need its dynamic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008